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ABSTRACT 

It is the intention of the author that this report serve as a strategic review of the 
potential for image processing techniques to aid the detection and classification of 
underwater mines and mine-like objects in various modes of sonar imagery. Image 
processing techniques to improve the performance of mine hunting operations using 
sector-scan, side-scan and the Acoustic Mine Imaging (AMI) project imagery are 
considered. Four basic components of any Computer-Aided Detection and 
Classification (CADCAC) technique are considered, namely, enhancement, 
segmentation, computer-aided detection, and computer-aided classification. In each of 
these fields, image processing techniques from the literature are examined and possible 
extensions or alternatives are discussed. 
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Applications of Image Processing 
to Mine Warfare Sonar 


Executive Summary 

Information from various mine warfare sonar systems is often presented to the 
operator in a visual form. To obtain the optimum performance of these systems, it is 
desirable to apply intelligent processing techniques to the corresponding imagery. This 
report examines image processing techniques which may have the potential to improve 
either system or operator performance. The types of mine warfare sonar imagery 
examined in this report are sector-scan, side-scan, and the AMI project imagery. For 
each of these three types of imagery, applicable image processing concepts and 
techniques are examined with reference to techniques recorded in the literature. 
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1. Introduction 


Sonar information collected while searching for, or identifying, underwater mines is 
often presented to the operator in the form of a two dimensional image. This is a result 
of the three-dimensional nature of the search domain and the human use of vision as 
the primary source of sensory information. The heavy human reliance on visual 
information has made human beings highly skilled at the detection and classification of 
objects in images. Despite human expertise at comprehending visual information, 
sonar imagery still presents many challenges since it lies outside the normal scope of 
human visual experience. 

Signal or image processing can be applied to the sonar data to help human operators 
detect and classify mine-like objects in the operational environment. The processing of 
sonar data can be broken into two domains. The first domain is the use of signal 
processing (mostly one-dimensional) techniques to enhance the creation of sonar 
imagery. For example, the use of adaptive beamforming techniques to enhance the 
contrast of mine-like objects in sonar imagery lies in this first domain. The second 
domain is the use of image processing (two or higher dimensional) techniques on sonar 
imagery to aid or automate the detection and classification of mine-like objects. Both of 
these domains are vitally important to mine warfare sonar, however this report is 
exclusively concerned with the second domain. 

This report examines image processing techniques and methodologies that have the 
potential to aid or automate the detection and classification of mine-like objects in 
sonar imagery. Such techniques may already have shown promise in the literature and 
are worth consideration and discussion, or may be potential new methods from other 
fields that might be applied to the field of mine warfare sonar. This report is not 
intended to be an exhaustive survey of the literature, instead it is intended as a 
strategic review of image processing techniques with the potential to aid mine warfare 
sonar. Many of the techniques described in this report may not be feasible given 
current computational restrictions, but hopefully their inclusion will provide useful 
ideas. The field of image processing is well developed, making it impossible to detail 
every technique in the literature which may be of use to mine warfare sonar, but 
hopefully the consideration of the techniques described in this report will encourage 
further research. 

This report examines image-processing techniques tailored for three different types of 
sonar imagery. Sector-scan sonar images, side-scan sonar images and the three- 
dimensional images produced by the AMI project. The image processing techniques 
examined in this report can be grouped into four categories as follows: 

• Enhancement techniques: Techniques that have the potential to enhance the 
contrast of mine-like objects in sonar images. Examples of this are the removal of 
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noise and clutter, background normalisation, and the processing of sonar imagery 
to make best use of available knowledge of the human visual system. 

• Segmentation techniques (low-level classification): Techniques that have the 
potential to classify individual pixels as belonging to background reverberation, 
clutter, highlights or shadows. This type of processing is usually not concerned 
with whether each pixel belongs to a mine-like object or not, but is often performed 
as a prelude to more advanced computer-aided detection and classification 
(CADCAC) techniques. 

• Computer-aided detection (CAD): Techniques that may be useful to detect mine¬ 
like objects in sonar imagery. Confirmation of whether the object is actually a mine 
and its specific type are left to the human operator or subsequent processing 
methods. 

• Computer-aided classification (CAC): Techniques which may be able to positively 
identify a mine-like object as a mine and determine the type and orientation of the 
mine involved. 

The above grouping is only a rough guide to image processing techniques, as a great 
deal of overlap is often found, and some techniques defy being grouped in this way. 

This report is structured into a number of sections. Section 1 is the introduction. Section 

2 looks at image processing techniques applicable to sector-scan sonar images. Section 

3 looks at image processing techniques applicable to side-scan sonar techniques. 
Section 4 examines techniques applicable to the AMI project. Section 5 examines some 
general concepts from the image processing field which may be useful to consider 
while developing any system for processing sonar imagery. Section 6 concludes this 
report. 


2. Sector-scan Sonar 


This section will examine image processing techniques that may enhance the utility of 
sector-scan sonar systems for mine warfare sonar. 

Sector-scan sonar imagery is produced by a sensor array that electronically scans a 
horizontally narrow beam to insonify an arc in a set direction. A two-dimensional 
image results which can be used to detect mine-like objects floating in the water 
column or resting on the seabed. During the formation of the image, any movement of 
the sensor array, or objects in the environment is assumed negligible. The images are 
formed fairly rapidly, generally within the order of a few seconds, and the human 
operators watch for objects in the images that persist for successive scans. Once an 
object is detected in a wide-angle view, the sonar settings may be changed to resolve a 
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narrow field more highly. Under the right conditions, a mine-like object may be 
positively classified in this way. 


2.1 General Concepts 

An important general concept for processing sector-scan sonar imagery is the fact that 
temporal information is available. Human operators detect the presence of mine-like 
objects by watching for patterns in the image that persist for successive scans. In the 
same way, the image processing techniques to be applied to these images should make 
use of this temporal information whenever possible. The most successful techniques in 
the literature, as will be shown below, make use of multiple time frames to detect 
mine-like objects. Hence processing algorithms for sector-scan sonar should tend to be 
three-dimensional to make best use of the information available. 

A related concept is the matter of vessel movement. If information is known about the 
motion of the vessel, this should be incorporated into the CADCAC techniques to 
simplify and speed up processing. As will be shown below, use of vessel movement 
information can aid image enhancement and CADCAC techniques by removing the 
problem of tracking targets stationary relative to the seabed. 

2.2 Enhancement 

Enhancement techniques for sector-scan images fall into two categories. Techniques 
that do not make use of temporal information and techniques that do make use of 
temporal information. Usually sector scan images have a high degree of contrast, and 
hence most attempts at detecting and classifying mine-like objects in these images have 
used only simple enhancement techniques. 

For non-temporal techniques, median filtering is common [1], [2], [3]. Median filtering 
was developed to handle image noise with so-called "long tail" statistics. This is due to 
the fact that the median filter is a maximum likelihood estimator of a signal in the 
presence of noise with a Laplacian distribution [4]. Backscatter and sonar clutter are 
considered to fit in this category. For certain types of noise, median filtering can be a 
powerful noise removal technique. Figures 1 and 2 illustrate the point. Figure 1 shows 
a binary image corrupted by "Salt and Pepper" noise. In this case, 20% of pixels in the 
image have been set to 1 or 0 randomly. Figure 2 shows the result of applying a 3 by 3 
median filter to the corrupted image. Note that the majority of noise pixels have been 
removed and the object in the image has become clearer. 
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Figure 1: Image degraded by noise. 



Figure 2: Image cleaned by median filter. 

For more complicated images, or images degraded by noise with more complex 
distributions, plain median filters have a number of disadvantages. The most 
important disadvantage is the relatively high computational cost. Petillot et al. found 
that simple local averaging of sector-scan images could produce results similar to 
median filtering at a greatly reduced computational cost [2]. This seems to be at odds 
with the assumption that noise in sonar imagery is non-gaussian and can be assumed 
to work well only under certain conditions. Median filters also have the disadvantage 
of tending to blur edges in the image, although they are more capable of preserving 
edges as compared with linear filters. In the example of Figure 1, the median filter has 
degraded the comers of the object in the image. The symmetry and sharp edges of a 
mine are a major factor used to distinguish them from natural formations. The current 
use of median filters for sector-scan imagery appears to be at odds with this concept. 
Research has been done recently in the main stream image processing community 
detailing new forms of median filters (more generally, nonlinear filters) that have 
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greater ability to remove noise while preserving edges without a great increase in 
computational load [5]. These filters seem to have not been applied as yet to sonar 
imagery. Consideration should be made to examining the efficiency of median filters 
and whether advantages can be gained by using superior median filter variants or the 
wide variety of other non-linear edge preserving filters developed by the image 
processing community to handle noise distributions with "long tail" statistics [6]. 

The author found little reference to the use of the wavelet transform as a possible non¬ 
temporal noise reduction technique for sector-scan sonar imagery. The multi-resolution 
nature of the wavelet transform has many similarities to fractals and the way humans 
process images and is increasingly finding more applications in society. The wavelet 
transform divides the data into a number of different channels where each channel 
describes image information with a different spatial-frequency characteristic. Figures 3 
and 4 show the way a correctly tailored wavelet transform can be used to extract an 
image neatly into a number of channels. Figure 3 shows a geometrical shape that 
contains lines of varying widths and directions. In the comers of the image, the lines 
are wide and diagonal. In the top and bottom middle of the image, the lines are 
predominantly horizontal, while in the left and right middle of the image, the lines are 
mostly vertical. In the centre of the image the lines are diagonal. Figure 4 shows a 
single level wavelet decomposition of Figure 3 using a biorthogonal spline wavelet. 
Each quadrant of the image represents a different channel of the wavelet transform. 
Note how the various components of the image have been extracted to divide the 
vertical, horizontal and diagonal information present [7]. 



Figure 3: A geometrical form. 
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Figure 4: A Wavelet Transform of Figure 3. 


Tailoring the wavelet type and size to suit the dimensions of mine-like objects in the 
image might enable the mine-like object and clutter to appear in different sets of 
channels in the wavelet decomposition. This has the potential to enable effective 
removal of the clutter without degrading the image of the mine-like object. The size of 
the base wavelet used for the noise removal could also be scaled according to the 
current sonar range setting to provide consistent performance without operator input. 

Enhancement techniques that make use of temporal information have recently 
appeared in the sector-scan sonar literature. Most temporal enhancement techniques 
attempt to separate stationary objects from non-stationary objects and clutter. The 
author believes that these methods have applications for mine warfare sonar. If the 
vessel motion is known, this motion can be corrected for within the image and this 
would enable mine-like objects be treated as stationary in the computational image 
domain. Temporally varying features, such as clutter, ambient biological noise, and 
fish could then be suppressed. Azimi-Sadjadi et al. compared a number of successive 
frames using a technique known as "Recursive High Order Correlation", an extension 
of the standard concept of "cross-correlation" [8]. Although computationally intensive, 
this method makes no assumptions about the objects being enhanced. Representing the 
sonar information in a three-dimensional format allows simple filtering operations to 
be performed in the temporal dimension of the data [9], [10], [11]. The literature 
indicates that only 10 or so frames may be required to remove static objects from non¬ 
static ones. More complicated techniques involving linking spatial and temporal 
information are a worthwhile research direction, and the wavelet transform seems an 
excellent basis. 

2.3 Segmentation 

Sector-scan images generally do not show shadow effects to the same degree as side- 
scan imagery. For this reason, classifying pixels as either "highlight" or "background" 
usually performs basic segmentation. Chantler et al. chose an optimal threshold for 
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each pixel using an iterative method. The resulting binary image can be operated on to 
close gaps or to remove incorrectly classified clutter [10]. The high level of contrast of 
sector-scan imagery generally reduces the effectiveness of low-level segmentation 
techniques, rendering them for the most part unnecessary. However, some form of 
adaptive thresholding where the threshold level is set according to the local statistical 
properties of the image, followed by clustering could be beneficial. 

2.4 Computer-Aided Detection 

Human operators use a number of visual cues to detect mine-like objects in sector-scan 
sonar imagery. Since the sonar images update every second or so, the operator looks 
for objects that remain present in many consecutive images and display a size and form 
consistent with mine-like objects. CAD techniques recently developed in the literature 
also make use of this temporal information [2], [10], [12]. 

A very illustrative method is that of Chantler and Stoner [13]. For objects discovered in 
the sonar image, a number of static features are computed that describe the shape and 
size properties of the object. Over consecutive scans, the feature measures for each 
object are computed. For any particular object, another set of temporal features is 
determined. These temporal features describe the changes in the static features over 
time. For example, a set of returns from a diver would be expected to display a lot of 
variation over time as the diver's position shifts, hence the static features derived from 
the diver's image would vary markedly from scan to scan. However a mine-like object 
would display little variation in its returns and hence the static features derived would 
remain relatively constant. This difference may not be very prominent using the static 
features from one scan, but by creating temporal features derived from the static 
features, the differences become easier to detect. Chantler and Stoner reported a 
marked detection improvement when temporal features were used [13]. 

In an operational environment, the mine-himting vessel is generally in motion, which 
presents new opportunities or challenges. If the motion of the vessel is not known, the 
mine-like objects in the image must be matched with their occurrence in subsequent 
scans. This is a motion-tracking problem and has been looked at in the literature. Lane 
et al. and Chantler et al. reported that satisfactory results were achieved using concepts 
borrowed from the field of optical flow estimation [9], [10]. To estimate optical flow, a 
cost function is created which is optimised to estimate the motion of each pixel from 
one image to the next image. Certain assumptions are made regarding the brightness 
versus motion model used and the results typically contain some noise. By grouping 
and filtering the results, the motion of objects over many frames can be determined and 
those objects can be tracked. The objects that they sort to track displayed varying 
temporal returns, unlike mine-like objects, so the system they developed may be more 
complicated than is required for mine warfare sonar. If vessel motion is known to some 
degree the problem becomes much easier, since the computer knows the predicted 
location of the object in the next scan. 
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Schweizer et al. used multiple detection algorithms to achieve an acceptable 
classification accuracy [14]. Certain types of detectors may make better use of a priori 
information, or provide alternative feature discrimination. This increases the 
robustness of the detection results and improves accuracy. 

Sector-scan CAD systems are still in their infancy and so a great deal of research can be 
done to improve them. More research needs to be done to determine the optimal 
temporal and static features to use for the detection of mine-like objects. Recently, 
certain powerful classification methods such as Residual Vector Quantisation have 
been applied to sector-scan mine warfare sonar with some success [12]. It may be 
worthwhile to investigate whether features derived from the wavelet decomposition of 
a sonar image, specifically tailored for the dimensions of mine-like objects, can form 
good discriminators. 

2.5 Computer-Aided Classification 

Once the operator has detected the presence of a mine-like object, the system is 
switched to a high-resolution mode, imaging only the vicinity of the detection. The 
operator then looks for signs that the object is actually a mine. Symmetry, the presence 
of straight or curved edges, and regular formations may be used to determine the 
identity of the target. The target may not appear to have a recognisable form in any one 
scan, so the operator may be forced to observe the object for a period of time to identify 
it. Foresti et al. extracted information about edge orientations and their angular 
relations to other edges in the object and used this information to classify targets as 
man-made or natural [15]. Foresti et al. developed a way to combine edge-detected 
information such that the effects of noise corrupting the edge orientation estimates can 
be greatly reduced and a stable estimate of the object shape can be obtained [15]. In 
terms of mine warfare sonar, a similar concept could be used to determine the exact 
shape and orientation of the mine-like object and this boundary information could be 
encoded in a suitable way and compared to a database of encoded mine shapes. In this 
way, a classification of the mine type may be possible. There is considerable literature 
[16], [17], [18] in the main stream image processing community describing robust, 
orientation and scale invariant methods for encoding boundary shapes, these 
techniques may be useful to help mine warfare sonar. Ongoing research in the image 
processing community into noise resistant edge detectors [16] might also be useful to 
help the operator to classify the mine-like objects detected. 

3. Side-Scan Sonar 


This section will examine image processing techniques that may enhance the utility of 
side-scan sonar systems for mine warfare sonar applications. 

Side-scan sonar images are formed by a sensor array fixed on a moving platform. The 
sensor array forms a narrow image of a swath of the environment perpendicular to the 
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motion of the imaging platform. As the platform moves, an image of the environment 
on either side of the platform is obtained. The image formation process ends when the 
imaging platform has left the zone of interest. Temporal information is used in the 
formation of a side-scan sonar image, but since only one image is obtained for each 
region within the zone of interest, the use of temporal techniques such as those 
recommended for sector-scan imagery is not possible. 

3.1 General Concepts 

Side-scan sonar images display many features similar to optical imagery from a purely 
image processing point of view. These images generally have a fairly even distribution 
of pixel values, and may often display a wide range of texture effects due to sea-floor 
characteristics. Mine-like objects in side-scan images can sometimes display low levels 
of contrast and often the acoustical shadow region appears larger and more prominent 
than the object itself. For this reason more research has been done to apply image 
processing techniques to side-scan images when compared to sector-scan images. The 
bulk of the research in the literature has concentrated on the presence of shadow effects 
produced by mine-like objects. The shadows produced by mine-like objects are often 
found to contain a great deal of information regarding the shape and size of the object, 
and it is generally considered crucial to use shadow information in any image 
processing-based CADCAC technique for side-scan imagery. 

3.2 Enhancement 

Most side-scan sonar image enhancement methods are designed to remove clutter and 
other forms of noise, without distorting or damaging the shape of the highlight and 
shadow regions associated with the mine-like object. Recently a promising technique 
has arisen which is based on concepts from the mature field of image restoration. In 
many works, clutter in side-scan images was suppressed using a concept known as 
"Total Variation Minimisation" (TVM) [19], [20], [21]. This technique is based on 
altering the image to minimise a functional consisting of two terms. The first term in 
the functional assures image fidelity and the preservation of shadow and highlight 
information. The second term in the functional is designed to smooth the image and 
hence reduce noise. The optimisation of this functional forms a multi-dimensional 
minimisation problem almost identical to the image restoration problem. In all of these 
references, TVM was shown to be very efficient at suppressing clutter in the images 
while preserving edge information for use by later CADCAC techniques. Despite 
highly impressive results, the form of TVM techniques used so far has been fairly basic. 
In the field of image processing the TVM functional is related to the "Constrained Least 
Square Error" functional, and a great deal of research has been done into adaptive, 
intelligent algorithms and cost function variants to best solve this problem [22]. The 
considerations involved are often identical; the removal of noise while preserving edge 
information. Many of these algorithms are designed to best preserve edge information 
in terms of human visual criteria. This is a vital consideration when human operators 
will examine the side-scan imagery. A worthwhile research direction would be to look 
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at whether adaptive algorithms could be used to better enhance the TVM technique for 
side-scan images. 

Aridgides et al. used an adaptive linear finite impulse response (FIR) filter to suppress 
clutter [23]. This is a well-known technique in both the signal and image processing 
communities but requires a model of the mine-like object to be enhanced. Such filters 
can adapt to the local clutter statistics in the image to achieve good results. Fernandez 
and Aridgides extended this concept to form an adaptive order-statistic filter, which 
like the median filter is tailored for noise with long tail statistics [24]. Further research 
in this direction may be worthwhile. 

Huynh et al. examined the wavelet transform as a way to remove noise from side-scan 
sonar imagery [25]. It was shown that wavelet and wavelet packet de-noising 
techniques could improve the quality of side-scan images and the optimal wavelet type 
and size was investigated [25]. They found the best performance was obtained when 
the wavelets were tailored to the size of expected mine-like objects. This success should 
encourage the consideration of other wavelet-based noise cleaning techniques from the 
optical imagery field as suitable methods for the enhancement of side-scan images. 

3.3 Segmentation 

For side-scan sonar images, segmentation is often used to separately classify pixels as 
belonging to highlights, background, or shadow regions before higher level CADCAC 
techniques are used to search for mine-like objects. After each pixel has been classified 
into one of the three choices, the pixels are often clustered together with their 
neighbours to remove incorrectly classified pixels. There exists a large variety of image 
processing techniques for segmentation and many of these have been applied to this 
problem. 

Hoelscher-Hoebing and Kraus used "Expectation Maximisation" [26]. This one of 
many iterative approaches where pixels are classified based on how well their local 
gray-level statistics match statistical models of the intended classes. Relating the image 
to a Markov random field model is then used to perform clustering. This method 
produced interesting results, yet has the disadvantage of requiring estimates of the 
probability density functions (PDFs) of the three classes involved. There are many 
related Bayesian techniques for image segmentation that may be used in similar ways. 

Guillaudeux et al. segmented side-scan images using a fuzzy version of k-means 
classification [27]. The use of a fuzzy technique provides a consistent framework for 
measuring how well each pixel matches each class, which can be used in an iterative 
manner to improve results. The concept of fuzzy sets is intended to mimic human 
decision making processes and has been shown to be highly effective in some cases. 
Nagao filtering was then used to group pixels and remove deviations. The Nagao filter 
uses the fuzzy set membership information to cluster pixels, while still preserving edge 
information. This direction of research may prove very useful to the problems of mine 
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warfare sonar, and provides a framework for understanding and improving upon the 
way human operators examine such imagery. 

Szymczak et al. examined the adaptation of the TVM approach from image 
enhancement to an image segmentation model for side-scan sonar images [19]. This 
approach is called "Mumford-Shah" segmentation and is based heavily on the TVM 
functional (see Section 3.2). Szymczak et al. achieved satisfactory segmentation results 
using this methodology [19]. This technique may be worth examining because it 
provides a way to link the enhancement and segmentation methodologies into a 
unified approach. In this way, research into techniques to improve the performance of 
TVM-related functionals in the image processing community may be applied to the 
problem of segmentation directly. 

A possibly worthwhile research concept is the use of edge information to adapt the 
segmentation process to favour the correct segmentation of man-made objects. When 
clustering the segmented pixels, the clustering procedure could make use of boundary 
and edge orientation information to determine whether an uncertain pixel should be 
added to a shadow or not. For example, if a shadow has an otherwise straight 
boundary except for a few pixels of uncertain class, an intelligent algorithm may assign 
a greater than normal probability to these pixels belonging to the shadow. Such an 
algorithm would then favour the correct segmentation of man-made objects. 

3.4 Computer-Aided Detection 

A wide variety of techniques have been used in the literature to attempt to detect mine¬ 
like objects in side-scan sonar imagery. Dobeck et al. used a two-dimensional non¬ 
linear matched filter [28]. The matched filter was basically a model of a mine-like object 
and the results of the matched filtering were fed into a k-nearest neighbour-based 
neural network classifier and an optimal discriminatory classifier. The decisions of 
these two classifiers were then combined to produce a final decision. The use of a non¬ 
linear matched filter appears problematic due to the many possible orientations of 
mine-like objects and the various different conditions in the operational environment, 
however an important concept from that investigation was the use of multiple 
classifiers to produce a more robust decision. 

Guo and Szymczak used the wavelet transform to decompose a side-scan image into a 
number of different channels [21]. The image of an object in each channel then forms 
features for a neural network classifier. The neural network classifier uses a set of sub¬ 
networks, each examining a different wavelet channel. This forms an interesting 
"multi-resolution" neural network which detects mine-like objects based on features at 
various different resolutions. This concept is probably an important component of 
human visual detection and classification, and may be a useful research direction. 

Nelson and Tuovila used information about pixel groupings within clutter to create a 
clutter detector [29]. For each object detected in the image, a set of features was sent to 
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a fractal-based classifier. The classifier was designed to detect clutter and hence could 
be used to remove false positives from the candidate objects. 

Calder et al. used a Bayesian classifier to detect objects against textured background in 
side-scan sonar images [30]. The technique requires models of the various textures 
present and the objects to be detected, and hence is of limited utility if such models do 
not exist. In conditions where accurate statistical models do exist, Bayesian classifiers 
perform well. 

As per the discussion in the previous section, mine-like objects will have a high 
probability of displaying symmetry or straight edges in their shadows. None of the 
above detectors of objects in side-scan images makes use of this fact. A simple method 
to incorporate this information would be to create a feature for each potential object 
that is weighted to indicate how many local edge orientations match others. In this way 
the detector could be weighted to detect objects with straight boundaries, and hence a 
high probability of being man-made. There are of course a variety of ways to include 
geometrical boundary properties as features and many such approaches can be found 
in the image processing literature [31]. 

3.5 Computer-Aided Classification 

The difficult problem of classifying mine type and orientation has not been greatly 
researched. Mignotte et al. used a genetic optimisation technique to search through a 
"template" space [32]. The technique described was not applied to mine classification 
in particular, but is instead a general method. In this template space, the shadow 
shapes from every possible mine type are stored. There are a number of permissible 
transformations that may occur to the basic template to reflect the orientation and 
range of the mine. These transformations are used as "genes" in a genetic optimisation 
technique. Such techniques attempt to simulate the principles of natural selection and 
"evolve" the solution to a cost function. Such a technique may be a useful research 
direction, but genetic optimisation methods have often been criticised as being 
computationally expensive. 

Galeme et al. extracted and encoded shadow boundaries using Fourier descriptors [33]. 
Fourier descriptors are a result of research in the image processing community into the 
encoding of boundary shapes in such a way that is invariant to scale, rotation and 
translation. Given a well-segmented mine-like object, the Fourier descriptors extracted 
could be compared against those present in a database to identify the mine type. 
Galeme et al. used this method to classify objects as man-made or natural only, but 
such a technique could be extended to classify mine types [33]. Research continues in 
the image processing community into the best representation of boundary shapes for 
comparison [16]. Recent research has found that using cubic b-spline curves [18], or the 
wavelet transform [17], to represent object contour shapes has many desirable 
properties and can outperform Fourier descriptors. 
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4. The AMI Project 

This section will examine image processing techniques that may enhance the utility of 
the AMI project. 

The AMI (Acoustic Mine Imaging) project involves the use of a large acoustic array to 
create a high resolution image at close range of a suspected mine for the purposes of 
positive identification. The images obtained by this method are three dimensional in 
nature, and suffer from poor contrast and the presence of side-lobe distortion. Image 
processing concepts can be used to enhance and possibly segment these images. In this 
section, computer-aided detection and classification will not be discussed since the 
detection of a mine-like object is assumed to be already performed, and the 
classification problem is assumed to be handled by a human operator. The objective 
here will therefore be the representation of the object in the best form for the human 
operator. 

At the time this report was written, the author had not had the chance to carefully 
examine the nature of the images produced by the AMI project. Image processing 
techniques considered would greatly depend on the nature of the data; in particular, 
the effective dynamic range of pixel information will have important consequences on 
the range of techniques available. A low dynamic range restricts the options to "binary 
morphology" techniques, such as dilation and erosion. 

4.1 General Concepts 

A major concept that the author feels should be reflected in any image processing 
approach to the AMI project is the full use of the three-dimensional aspect of the data. 
Special compensation can be made for the differences in the data relationships along 
certain dimensions due to the position of the sensor array, however the fundamental 
three-dimensional nature of the data should be used whenever possible. The author 
has had previous experience with the enhancement and segmentation of objects in 
three-dimensional imagery [34], and has found that a key consideration is the time 
taken to perform the image processing operation. With three-dimensional data, the 
volume of data to process rapidly increases with image size and computational time 
also increases. Time taken to perform an operation can easily get out of hand. 

4.2 Enhancement 

At the current time, enhancement and background correction are performed using a 
split-window normaliser along each dimension separately. This is not a three- 
dimensional approach and could probably be improved. For noise removal, it may be 
worthwhile to examine whether three-dimensional median filter variants, or wavelet 
transform approaches can improve image quality. If sufficient dynamic range exists, 
some methods of adaptive histogram specification may be appropriate. Adaptive 
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histogram specification involves analysing the histogram details in the neighbourhood 
of each pixel and computing a pixel value transformation for that region which best 
emphasises desired details. This may improve the contrast of the imaged object. The 
visual representation of the information in this type of image may also benefit from the 
use of colour to represent pixel distance from the observer in the third dimension. 

Another approach that may prove useful is to weight each pixel value dependent upon 
the statistics and density characteristics of its local region. Pixels in regions whose 
characteristics do not suggest the presence of an object can have their contrast reduced, 
while pixels in regions whose statistical/density characteristics do suggest the presence 
of an object can have their contrast enhanced. 

For images with low dynamic range, noise can sometimes be suppressed by adaptively 
thresholding the object followed by binary operations such as erosion followed by 
dilation [31]. 

4.3 Segmentation 

At the moment, three-dimensional object segmentation has not been attempted on the 
images produced by the AMI project to the best of the author's knowledge. A simple 
technique currently used is a neighbour association algorithm applied to the data. Each 
pixel has its local neighbourhood examined to determine whether other large valued 
pixels lie in the immediate vicinity. If none are present, the pixel being examined is 
assumed to be due to noise and removed. This seems a good approach to remain as 
part of a segmentation methodology. This technique has a result that is very similar to 
the "Erode and Dilate" noise removal technique described in the previous section. 

The problem of segmentation is in effect actually a problem of classification (although 
it may not always be directly treated as such) and hence a classification approach will 
probably produce the optimal results. For each pixel a set of features should be 
extracted detailing local statistics and density aspects in the region surrounding the 
current pixel. Research into the correct features to discriminate the object in these 
images is a logical first step to any segmentation method. These features can then be 
used to segment the image based on a set of heuristic rules or some other type of 
classifier. Using some heuristic rules to perform the segmentation is most probably 
preferable due to changing operational conditions and the speed at which processing 
needs to be done. Following segmentation, the pixels will need to be clustered to 
remove noise and incorrectly classified pixels. The method used will depend on the 
available processing power. Complicated methods of clustering based on various 
models of the data (such as Markov random field) can be used if the necessary 
processing power is available. If processing power is at a premium, simple methods to 
close gaps and remove noise can be used [31]. 


14 


DSTO-GD-0237 


5. General Concepts 

In this section we will examine some general concepts that the author feels need to be 
considered in any consistent system for the computer-aided detection and classification 
of mine-like objects in sonar imagery. 

5.1 The Human Interface 

A fundamental problem that occurs when dealing with imagery is the difficulty in 
quantifying the ways humans see and analyse visual information. Research has been 
conducted outside and within the DSTO to examine image processing techniques to 
render key information in an image in a form more striking to a human observer [35], 
[36], [37]. When enhancing images to help human operators detect and classify objects 
present, simply maximising the signal to noise ratio of the object may not always 
produce the best results. Such simple mathematical considerations ignore the non¬ 
linear response of human eyesight to varying levels of brightness [31], and the use of 
edges, textures and motion as detection and classification cues [38]. The human 
operator is often the most vital part of the processing system and so if aspects of the 
human component are ignored, the system can only ever perform sub-optimally. By 
the same token, imagery should not be enhanced to appear aesthetically pleasing for a 
human prior to computer-aided detection and classification (C ADC AC). Image 
enhancement prior to a CADCAC technique can only have its effectiveness measured 
in terms of the accuracy of the detection results. When developing algorithms to 
enhance imagery it is easy for a researcher to overlook this and design an enhancement 
algorithm purely based on what makes the image "look good". 

5.2 The Human Example 

Few CADCAC techniques can approach the performance of an expert human operator. 
Those that come close invariably do so in a limited environment very different from 
the normal visual world humans perceive. A great deal of research has been performed 
in the image processing community to discover ways to mimic human and natural 
decision making processes. Neural networks in many of their various forms are 
motivated by the structure of animal brains. Genetic optimisation techniques are 
designed to mimic natural selection. The wavelet transform is considered to be more 
compatible with the human visual system than other transforms such as the Fourier 
transform. Fuzzy set theory is designed to simulate human decision processes. 
Research continues in this direction in a number of different fields. The author believes 
that mine warfare sonar systems able to incorporate intelligent principles motivated by 
the ways humans think about and see the world will provide the best solutions to the 
difficult problem of detecting and classifying underwater mines. The reader interested 
in these concepts is referred to Russell and Lane [39] for an example of a framework for 
an intelligent system to enable an unmanned autonomous submersible to visually 
perceive its operational environment. 
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6. Conclusions 


This report has examined various image processing techniques which have the 
potential to aid the detection and classification of mine-like objects in sonar imagery. 
Three types of sonar imagery were looked at in particular: 

• Sector-scan (forward looking) sonar. 

• Side-scan sonar. 

• The AMI project. 

Within each of these sonar-imaging applications, each of the four components of any 
Computer-Aided Detection and Classification (CADCAC) system was examined. These 
components are: 

• Enhancement. 

• Segmentation. 

• Computer-Aided Detection. 

• Computer-Aided Classification. 

For each of these components, image processing techniques with the potential to 
improve the performance of mine warfare sonar systems were discussed, and examples 
of successful or instructive methods from the literature were given. 

Finally, some general image processing considerations common to each imaging 
methodology were given. The need to consider the human element of the mine¬ 
hunting system is emphasised; both in its effect on the way data are processed and the 
way data are presented. 

Readers interested in the state of the art for various image processing fields are referred 
to the excellent review article by Chellappa et al. [16]. 
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